Cumulative route improvements spontaneously emerge in artificial navigators even in the absence of sophisticated communication or thought

Homing pigeons (Columba livia) navigate by solar and magnetic compass, and fly home in idiosyncratic but stable routes when repeatedly released from the same location. However, when experienced pigeons fly alongside naive counterparts, their path is altered. Over several generations of turnover (pairs in which the most experienced individual is replaced with a naive one), pigeons show cumulative improvements in efficiency. Here, I show that such cumulative route improvements can occur in a much simpler system by using agent-based simulation. Artificial agents are in silico entities that navigate with a minimal cognitive architecture of goal-direction (they know roughly where the goal is), social proximity (they seek proximity to others and align headings), route memory (they recall landmarks with increasing precision), and continuity (they avoid erratic turns). Agents’ behaviour qualitatively matched that of pigeons, and quantitatively fitted to pigeon data. My results indicate that naive agents benefitted from being paired with experienced agents by following their previously established route. Importantly, experienced agents also benefitted from being paired with naive agents due to regression to the goal: naive agents were more likely to err towards the goal from the perspective of experienced agents’ memorised paths. This subtly biased pairs in the goal direction, resulting in intergenerational improvements of route efficiency. No cumulative improvements were evident in control studies in which agents’ goal-direction, social proximity, or memory were lesioned. These 3 factors are thus necessary and sufficient for cumulative route improvements to emerge, even in the absence of sophisticated communication or thought.


Introduction
Cumulative cultural evolution occurs when individuals pass down adaptive innovations through social means (e.g., teaching or copying), leading to progressive increases in fitness over generations [1,2].In humans, this "ratcheting" [3] of socially transmitted improvements is vital to human technological advancement [4] and has historically been attributed to uniquely human "high-fidelity" communication [5].However, experimental work has shown simple emulative learning is sufficient for cumulative culture to occur [6,7].Some argue that other species show cumulative culture, e.g., in transmission chains of songs in zebra finches [8] and humpback whales [9], tool use in crows [10] and chimpanzees [11][12][13], and pattern reproduction in great tits [14] and baboons [15].One particularly striking example comes from pigeons, which seem to pass down route improvements [16].
Homing pigeons (Columba livia) are suboptimal navigators that develop and remember idiosyncratic routes when flying alone or in pairs [17].While paired birds fly more efficient routes than individuals [18], pairs in which experienced pigeons are swapped for naive ones show "innovation": beneficial modifications between generations [16] that meet criteria for cumulative culture [19].Perhaps pigeons pool information between individuals, learn and decide through collective intelligence, and evaluate performance to prune worse innovations [16]; or develop intra-pair dynamics of communication and leadership [20].
An alternative explanation is that cumulative route improvements emerged as accidental by-product during navigation in groups.Here, I directly address this question using a minimal cognitive architecture in artificial agents that are bound by only 4 rules derived from avian navigation (Fig 1).The first is goal direction, akin to birds' solar [21] and magnetic compasses [22] that allow them to orient towards their home even from unfamiliar release sites unless under total overcast with disorienting magnets glued to their head [23].The second is social proximity, which birds seek when flying together [24].The third is route memory, which in pigeons could depend on visual landmarks [25] and improves over consecutive flights [26].The fourth is continuity, a tendency to continue along the current heading to avoid implausibly erratic patterns.Crucially, there is no communal decision-making, evaluation of outcomes, or deliberate social communication.
The artificial navigator model is a weighted mixture of Von Mises distributions F, with weights w that add up to 1 (Eq 1).These are akin to normal distributions, but they are circular, so that the tails wrap around.To produce the next heading h in journey i at time t+1, an agent combines information from time t on bearings towards the goal b goal , the next memorised landmark b landmark , and other agents' estimated future position bother .As in birds, not all bearings are equally precise, which is reflected in each component's precision parameter κ.For example, there is uncertainty about where the (solar/magnetic compass) goal is [21,22], whereas pigeon visual acuity is good enough [27,28] to identify nearby visual landmarks along a well-memorised route (although they are not always used, [29]).To prevent unnaturally jerky movements, the final component ensures continuity by sampling from a narrow distribution that is centred on the current heading.For a full account of the algorithm, please refer to Materials and methods.
Agents travelled in 3 conditions that mapped onto work in pigeons [16]: solo, paired, and an experimental condition with generational turnover.In the solo and pair conditions, 1 or 2 agents made 60 consecutive journeys.The experimental condition also involved pairs, but a naive replaced an experienced agent every 12 journeys.A total of 50 repetitions were done for each condition and set of weight parameters.Precision parameters were fixed at κ continuity = 8.69 (equivalent SD = 0.35), κ goal = 1.54 (1.0), κ social = 2.18 (0.80), κ memory,1 = 0.85 (0.9) to https://doi.org/10.1371/journal.pbio.3002644.g001κ memory,5 = 6.78 (0.40), based on model fits for pigeon data.Note that memory precision improves over journeys, as per evidence from pigeon flight [26].

What is (not) cumulative culture?
Human's accumulated innovations are undeniably "superlative" [30] and many depend on combining physical phenomena ("Type II" cultural evolution), whereas nonhuman innovations typically optimise only within a phenomenon ("Type I") [4].An appealing framework describes 4 core criteria of cumulative cultural evolution: behaviour needs to (1) show variation introduced by interaction between individuals; (2) be passed on through social learning; (3) improve performance; and 4) repeat over generations [19].Few examples of "cumulative culture" in animals meet all 4, and rarely are extended criteria (functional dependence, diversification into lineages, recombination across lineages, exaptation, or niche construction) met [19].
Route improvements in successive generations of pigeons were described as cumulative culture [16], and it was indeed listed as meeting all core criteria of the aforementioned framework [19].However, whether animals genuinely show cumulative culture is controversial.An alternative explanation is that individuals attend to others' actions, and then reinnovate a "latent solution" to produce similar outcomes [5].Alternatively, apparent innovations could have previously been unobserved or learned not socially but in response to changing environments [31].It is hotly debated whether these alternatives are valid and relevant; see [32] and the numerous responses for an overview of current opinions.
The agents employed in this study arguably do not meet the above standard.Their "innovation" is limited to an increase in efficiency, which is decidedly unlike the development of novel behaviour.While focussing on task efficiency offers insight into cumulative cultural evolution [33], a focus on task solutions can obscure that humans also actively discover new problems and generalise solutions between them, which nonhumans rarely do [34].Agents also do not engage in "social learning" as it is traditionally defined: all they do is follow other individuals, without explicit demonstration or observation of a concrete task.Hence, I will refer to their outcome as "cumulative route improvements."

Results
Artificial navigators travelled in an "experimental" condition with generational turnover (pairs with replacement of an experienced for a naive individual each generation) or in control conditions without turnover (paired or solo).They showed various levels of route efficiency (Figs 2 and A in S1 File), which was computed as start-goal distance divided by travelled distance [16], and ranged between 0 (never reached the goal) to 1 (straight line from start to goal).Parameters could be optimised for final-route efficiency ( Cumulative route improvement was quantified as the increase in route efficiency between each generation.This occurred exclusively in the experimental condition ( Figs A, B, and D in S1 File), replicating empirical data [16].

Naive individuals can benefit from the experienced
In the experimental condition, naive individuals could benefit from following an experienced agent with established route memory.Compared to the pair control condition, naive individuals showed more efficient paths (Fig 3) if their experienced counterpart relied more strongly on memory (w memory ).However, naive agents were worse off at low memory-reliance, particularly if the relative influence of goal-direction (w goal ) was low, and if they more strongly sought social proximity (w social ).

Experienced individuals benefit from the naive
While it is perhaps obvious that naive agents could benefit from following established paths, more surprising was that experienced individuals also benefitted from their naive counterparts.This occurred due to regression to the goal.Compared to extreme samples, random samples are more likely to be nearer a distribution's centre; this is regression to the mean.Similarly, experienced agents draw from internal distributions, including for goal-direction and route memory.Naive agents sample from internal distributions too, but do not have route memory yet, and hence are more biased towards the goal than experienced individuals.Because agents aim for social proximity, naive navigators should thus subtly pull experienced agents towards the goal.
This was born out empirically, as relative bearings for experienced towards naive agents were more likely to also be in the direction of the goal (Fig 4).This was primarily true for lower values of w memory and increased with w social .Regression to the goal thus allowed naive agents to memorise slightly more efficient routes than their paired experienced agent.

Control experiments with lesioned agents
Agents were lesioned in control experiments to investigate which navigation components were necessary for cumulative route improvements to emerge (Table 1).Control experiments  Table 1.Efficiency quantifies how close agents were to the direct path from start to goal.Final efficiency is measured in the last generation and generational increase as the difference between generations.Cumulative route improvements are reflected by a positive intergenerational increase and occur in the "experimental" condition ("pair" and "solo" are control conditions).The "no lesion" column reflects optimal scores; the other columns reflect scores after replacing the goal, social, memory, or continuity component with a uniform distribution (noise).

No lesion Goal lesion (all)
Goal lesion (gen>1) employed the same weights as those that achieved highest final efficiency or intergenerational efficiency increases in the experimental condition (described above).However, headings were sampled from uniform distributions (i.e., noise) instead of being influenced by goal-direction, social proximity, route memory, or continuity.When goal-direction was lesioned for all generations, agents engaged in random walks that failed to reach the goal in time.When goal-direction was lesioned for all but the first generation, a path could be established within the first generation.This isolated goal-direction's necessity for intergenerational improvement, which should not occur with lesioned goal-direction if it is dependent on regression to the goal.

Social lesion Memory lesion
When social proximity or route memory was lesioned, efficiency was barely reduced, but generational increase was nullified.When continuity was lesioned, efficiency was greatly reduced, but the pattern of generational increases remained intact: present in the experimental condition, but not in pair or solo controls.
In another set of control experiments (Table 2), the precision of each navigational component was varied from low (wide Von Mises distribution) to very high (narrow distribution).Wider distributions reduced intergenerational efficiency increases, and a wide goal component even prevented agents from completing their routes.Narrower distributions effected less change, although a more precise goal component did subtly reduce increases in intergenerational efficiency.The pattern of cumulative route improvements in the experimental but not in the pair and solo control conditions was apparent throughout.
The lesion experiments suggest goal direction, social proximity, and route memory were all crucial for cumulative route improvements to emerge in this model.

Artificial navigator model fits empirical data
When fitted on 10 repetitions of the experimental condition in pigeons (data published by [20]), average parameter estimates were w goal = 0.12 (SEM = 0.03), w social = 0.16 (SEM = 0.03), w memory = 0.09 (SEM = 0.01), and w continuity = 0.59 (SEM = 0.03).That these weights did not Table 2.This table illustrates how the precision of each navigation component impacts cumulative route improvements, which are quantified by a positive intergenerational increase in path efficiency in the "experimental" condition ("pair" and "solo" are control conditions).The "normal precision" column reflects scores from the current model parameters.The "high precision" and "low precision" reflect halving and doubling the standard deviation, which is then transformed back into precision parameter κ for a navigational component Von Mises distribution.align with optima for intergenerational improvement or efficiency for agents suggests that the artificial navigator model is insufficient to capture the full complexity of pigeon behaviour, which agrees with interpretations put forward by others [16,20].

Discussion
The minimal cognitive architecture of goal-direction, social proximity, and long-term memory was sufficient for the emergence of cumulative route improvements.It was driven by regression to the goal over generations: as agents in a new pair aligned and converged their headings, experienced agents travelled along a remembered route, while their naive counterparts introduced a subtle goal-directed bias.These results suggest that stepwise improvement between generations can occur when individuals simply seek proximity to others.Agents had no capacity or intent to communicate, but information transferred between them as naive agents followed and memorised experienced agents' routes, while their subtle goal-directed pull introduced stepwise improvement between generations.While previous work has demonstrated step-wise improvement between generations through emulative learning [6,7], tasks required strategic social learning and advanced cognitive skills.Another difference is that the current task presents a clear limit: when the direct path between start and goal is reached, no further efficiency improvements can be made.The current findings outline a minimal set of cognitive abilities that is necessary and sufficient for cumulative route improvements to emerge.
The identified minimal set of cognitive abilities predicts that species with similar architectures could also show cumulative improvements, with a potential example in social ants who navigate along idiosyncratic one-way routes using landmarks [35].My results also demonstrate a role for naive individuals in cumulative improvements.This aligns with findings from bluehead wrasse, which use traditional mating sites (like paired agents stick to idiosyncratic routes), but adopt new sites upon complete population replacement [36,37].It also aligns with empirical work in great tits, in which population turnover drove cumulative improvements in efficiency due to new naive individuals adopting efficient variants [38].

Do the current findings extend to cumulative cultural evolution?
Examples of behaviour described as "cumulative culture" in animals often do not meet core criteria of a particular framework [19], although cumulative route improvement in pigeons has often been interpreted as meeting all core criteria [16,19] at least for Type I cumulative culture [4].While the current model reproduces cumulative route improvements in successive generations, it could be argued that its "innovation" is unlike human innovation, and that its "social learning" is without the traditional demonstrator or observer.
Not all human cumulative culture is Type II.For example, humans can increase a wheel's speed over several generations without gaining understanding of the physics behind their solutions [39].Another example is language, in which systematic structure can develop over generations that share the goal of communicating effectively for the benefit of naive individuals [40].In these situations, there are clear goals (meaning efficiency can be optimised), memory in experienced agents, and transfer to naive agents through implicit means.While the current model does not readily extend to these situations as it is, there is opportunity to explore the overlap between following another individual along a route and following an individual's actions or utterances.
The difference between cumulative culture in humans and other animals is typically described as a qualitative distinction [4,19].Some simulations suggest that this distinction could arise from a quantitative difference in the fidelity of information communication [41].Specifically, high fidelity reduced the loss rate of cultural traits, and if this breached a threshold it allowed traits to survive long enough to be recombined, which in turn led to cumulative culture.Through an optimistic lens, this could be taken to suggest that rudimentary aspects of cumulative culture found in animals are on the same continuum as cumulative culture found in humans.However, even if that holds for other nonhuman individuals, the artificial navigators introduced here fall well below the required fidelity threshold.Hence, if someone did consider the current model an example of rudimentary "cumulative culture," it would never pass beyond simple optimisation of efficiency.

Conclusions
In sum, artificial agents with minimal cognition reproduce cumulative route improvements previously shown in pigeons.This is qualitatively different from cumulative culture in humans, and it is unclear whether the current model extends to more complex situations.However, these findings do suggest that cumulative improvement across generations could be an emergent property in animals that work towards a goal alongside more experienced individuals.

Artificial navigator model
Artificial navigators were agents that embarked on journeys from a set starting point to a set goal, although they did not always reach this goal.They were bound by 4 rules, each implemented as an iterative sampling process from a Von Mises distribution.The centre of each distribution was determined by a bearing, and the spread by certainty of information.At each time point, an agent's heading was updated by sampling each distribution and computing a weighted circular mean (Eq 1).Weights were set at agent initialisation and added up to 1. Precision parameters were based on empirical data (see under "Experimental design").
The first rule was goal direction.The centre of this distribution was the bearing towards the goal b goal , its precision parameter was κ goal , and its weight w goal .The bearing was computed from the coordinates of the goal (x goal ,y goal ) and agent at time t (x t ,y t ) (Eq 2).The purpose of this rule was to orient agents towards the goal, just like pigeons can orient homewards upon being released from unfamiliar sites.This ability likely depends on the sun, as starlings and pigeons can learn to use light and time-of-day to orient towards rewards [21], and pigeons orient homeward when the sun is visible [23].They can even do so when it is overcast, but their initial orientation becomes more random when magnets are glued to the back of their heads [23], suggesting that pigeons also use an internal compass.For more comprehensive overviews, see [22] and [29].
The second rule was social proximity.This distribution is a weighted composite of a Von Mises distribution for social convergence that is centred on the bearing towards another agent's estimated future position bother and another Von Mises distribution for social alignment that is centred on another agent's current relative heading.The alignment of headings between agents at close proximity is a crucial part of flocking behaviour [42], but at larger distances agents need to converge rather than align to achieve social proximity.Samples drawn from the convergence distribution were weighted with proportion p and those drawn from the alignment distribution with (1-p).Proportion p was drawn from a cumulative normal distribution with mean 0.5 and standard deviation 0.1, which is equivalent to a distance of 30 metres, at which pigeons are estimated to be able to recognise individuals [43].Both composite distributions have precision parameter κ social , and the combined distribution has weight w social .
Bearings towards other agents were computed from an agent's position at time t, (x t ,y t ) and other agent j's expected position at time t+1 (Eq 3).The expected position of agent j at time t +1 was estimated on the basis of their velocity v (which was kept constant) and their heading h j,t at time t (Eq 4).
bother ¼ atan 2 ðŷ j;tþ1 À y t ; xj;tþ1 The third rule was route memory.This was established during an agent's first journey, in which passed landmarks were committed to memory.Across the map of 200 by 130 units, 6,500 landmarks were spread.This aligns with landmark detection using pigeon flight routes [26] and edge detection in aerial photography [25].During consecutive journeys, an agent attempted to fly from one memorised landmark to the next by sampling from a Von Mises distribution centred on the bearing towards the next landmark b landmark , with spread κ memory,i for journey i, and weight w memory (Eq 5; see Fig C in S1 File).There were no memorised landmarks in the first journey, so the spread for κ memory,1 was set to 0, resulting in a completely uniform distribution.For all following journeys, κ memory,i was set to 1.82, 2.29, 2.98, 4.19, and then plateaued at 6.78.This was analogous to a linear decrease in standard deviation from 0.9 to 0.4 and was based on model fits to pigeon homing data (see under "Data reduction and statistics: Pigeons").Agents proceeded to navigate towards the next landmark l+1 if they came within a threshold distance of landmark l.This threshold was set as 10 times the distance agents could travel between time t and time t+1.
The gradual improvement in memory precision over several journeys and the anchoring to landmarks were based on Gaussian process models of pigeon navigation [26].While the current implementation was less elegant than its inspiration, it was computationally inexpensive, and parsimonious with sampling from distributions of other bearings.b memory ¼ atan 2 ðy landmark;l À y t ; The fourth and final rule was continuity.This assured that during journey i, an agent's next heading at time t+1 would be similar to their heading at time t.The continuity component was sampled from a Von Mises distribution centred on current heading h(t), with precision parameter κ continuity , and weight w continuity .

Experimental design
Agents travelled in 3 conditions that mapped onto work in pigeons [16]: solo, paired, and in an experimental condition with generational turnover.In the solo and pair conditions, 1 or 2 agents made 60 consecutive journeys.In the experimental condition, a naive replaced an experienced agent every 12 journeys.
Agents travelled 1 distance unit per 1 time unit, attempting to find a fixed goal from a fixed starting point that were 104 units apart.The maximum distance agents were allowed to travel was 2,506 units.Compared to the map used by pigeons in Sasaki and Biro's study [16], this is equivalent to a flight of 200 km and approximately 5 h.This cut-off was chosen because pigeons would have suffered continuously increasing concentrations of uric acid and other metabolites [51], and a marked increase in reactive oxygen metabolites and decrease in serum antioxidant capacity [52].
Weight parameters w goal and w social varied from 0.05 to 0.35 in steps of 0.05, and w memory from 0.05 to 0.5 in steps of 0.05, resulting in 610 unique combinations.No combinations with weight sums over 1 were included, and w continuity made up the difference for all weight sums under 1.A total of 50 repetitions were done for each condition and each unique combination of parameters, resulting in a total of 30,500 simulations.

Data reduction and statistics: Pigeons
Individual pigeon GPS data (defined by latitude and longitude) published by others [53] was converted to Universal Transverse Mercator (UTM) coordinates (grid zone 30U).Samples with velocities under 25 or over 150 km/h were excluded from flights, to filter breaks and apparent GPS glitches.Flights were completely excluded it they contained coordinates further than 17.03 km (twice the start-goal distance) away from the point midway between start and goal.Out of 2,176 files in the original dataset, 6 were excluded for straying too far off course, and 45 for not reaching the goal.Sasaki and Biro [16] also imputed several early incomplete flights with direct-to-home trajectories, which was not done here to avoid fitting models to imputed data, but the pattern of results matches nevertheless (Fig 2).
Best parameter fits for pigeon flight data were determined through maximum likelihood estimation.This is an established way of deriving parameter estimates for mixture models of Von Mises distributions, for example, in research on visual short-term memory [54].To speed up the fitting process, GPS data was downsampled to 0.05 Hz (1 sample every 20 s).

Data reduction and statistics: Agents
Simulation results were averaged between paired agents and over independent runs within the same condition and parameter settings.Efficiency for the final generation was computed as the highest out of 12 journeys in the fifth generation.
Generational efficiency improvement was computed as the average difference in route efficiency between consecutive generations.To reduce the impact of random fluctuations, the most efficient (typically the final) routes were taken as representative within each generation.The first generation in the experimental condition was omitted, to avoid comparisons between single and paired journeys.
To avoid trivial statistical significance that can be achieved through increasing the number of simulations, inferences on the basis of statistical tests were avoided and were instead made on the basis of holistic interpretation.Readers are invited to scrutinise figures, data, models, and software.

Fig 2 .
Fig 2. Progression of route efficiency as a function of flight number.The top panel shows results for the optimum for final efficiency, the middle for the optimum for intergenerational improvement, and the bottom panel for pigeon data published by others [20].Lines show mean values over independent runs, with 95% confidence intervals as shaded areas.In the experimental condition, a naive agent replaced an experienced one in each generation; in the solo condition, a single agent made all journeys with no generational turnover; and in the pair condition, 2 agents journeyed together without turnover.Parameters for the navigation model were the same between each of the 3 conditions, and weights are listed above each panel.https://doi.org/10.1371/journal.pbio.3002644.g002

Fig 3 .
Fig 3.Each panel shows the difference in route efficiency between naive agents in the experimental condition (generational turnover) and the first 12 journeys from agents in the pair control condition (without generational turnover).Positive differences indicate that naive agents had better route efficiency compared to control.Each panel represents a combination of w goal and w social parameters, while darker lines indicate higher levels of w memory .Lines represent averages across 50 independent runs and their shaded areas the 95% confidence interval.https://doi.org/10.1371/journal.pbio.3002644.g003